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Abstract — The relation between rate distortion function (RDF) 
and Bayesian filtering theory is discussed. The relation is estab- 
lished by imposing a causal or realizability constraint on the 
reconstruction conditional distribution of the RDF, leading to the 
definition of a causal RDF. Existence of the optimal reconstruc- 
tion distribution of the causal RDF is shown using the topology 
of weak convergence of probability measures. The optimal non- 
stationary causal reproduction conditional distribution of the 
causal RDF is derived in closed form; it is given by a set of 
recursive equations which are computed backward in time. The 
realization of causal RDF is described via the source-channel 
matching approach, while an example is briefly discussed to 
illustrate the concepts. 

I. INTRODUCTION 

Shannon's information theory for reliable communication 
evolved over the years without much emphasis on real-time 
realizability or causaUty imposed on the communication sub- 
systems. In particular, the classical rate distortion function 
(RDF) for source data compression deals with the character- 
ization of the optimal reconstruction conditional distribution 
subject to a fidelity criterion (T\, fl], without regard for 
realizability. Hence, coding schemes which achieve the RDF 
are not realizable. 

On the other hand, filtering theory is developed by imposing 
real-time realizability on estimators with respect to measure- 
ment data. Specifically, least-squares filtering theory deals 
with the characterization of the conditional distribution of 
the unobserved process given the measurement data, via a 
stochastic differential equation which causally depends on the 
observation data. 

Although, both reliable communication and filtering (state 
estimation for control) are concerned with the reconstruction 
of processes, the main underlying assumptions characterizing 
them are different. There are, however, examples in which 
the gap between the two disciplines in both the underlying 
assumption and the form of reconstruction is bridged |T1, iQ, 
El. 0, I©. In information theory, the real-time realizability 
or causality of a communication system is addressed via 
joint source-channel coding |7| (for memory less channels and 
sources). 

Historically, the work of R. Bucy (HJ appears to be the first 
to consider the direct relation between distortion rate function 
and filtering, by carrying out the computation of a realizable 
distortion rate function with square criteria for two samples 
of the Omstein-Uhlenbeck process. The earlier work of A. 



K. Gorbunov and M. S. Pinsker 191 on e-entropy defined via 
a causal constraint on the reproduction distribution of the 
RDF, although not directly related to the realizability question 
pursued by Bucy, computes the causal RDF for stationary 
Gaussian processes via power spectral densities. The realiz- 
ability constraints imposed on the reproduction conditional 
distribution in JS) and Ig) are different, the actual computation 
of the distortion rate or RDF in these works is based on 
the Gaussianity of the process, while no general theory is 
developed to handle arbitrary processes. 
The objective of this paper is to develop the general theory 
by further investigating the connection between realizable rate 
distortion theory and filtering theory for general distortion 
functions and random processes on abstract Polish spaces. The 
connection is established via optimization on the spaces of 
conditional distributions which satisfy a causality constraint 
and an average distortion constraint. 
The main results obtained are the following. 

a) Existence of optimal reconstruction distribution mini- 
mizing the causal RDF using the topology of weak 
convergence of probabihty measures on Polish spaces. 

b) Closed form expression of the optimal reconstruction 
conditional distribution for non-stationary processes, via 
recursive equations computed backward in time. 

c) Realization procedure of the filter based on the causal 
RDF 

d) Example to demonstrate the realization of the filter 
Although, the operational meaning of the causal RDF in terms 
of causal and sequential codes is not pursued, it is pointed out 
that by utilizing the assumptions and coding theorem derived 
in ifTOl , the causal RDF derived is the optimal performance 
theoretically achievable (OPTA) for sequential codes, while it 
is related to the OPTA for causal codes [11]. 

Next, we give a high level discussion on RDF and filtering 
theory, and discuss their connection. 

Consider a discrete-time process X" — {Xq, Xi, . . . , Xn} G 

'^o.n — x"=o'^j' ^nd its reconstruction F" — 

{Yo,Yi, . . . ,Yn} e yo^n - x^^oy^ ^hcrc X, and y, 
are Polish spaces. 

Bayesian Estimation Theory. In classical filtering, one is 
given a mathematical model that generates the process X", 
{Pxi\xi-^{dxi\x^^^) : i = 0,1,..., n}, often induced via 
discrete-time recursive dynamics, a mathematical model that 



generates observed data obtained from sensors, say, Z", 
{Pz,\Z'-Kx^ idzi\z^-^ , x^) : i = 0,1,..., n}, while are 
the causal estimates of some function of the process X" based 
on the observed data Z". The classical Kahnan Filter is a 
well-known example [12], where Xi = 'K[Xi\Z^~^], i = 
0,1,..., n, is the conditional mean which minimizes the 
average least-squares estimation error Thus, in classical fil- 
tering theory both models which generate the unobserved and 
observed processes, X" and Z", respectively, are given a 
priori. Fig. 1 is the block diagram of the filtering problem. 
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Fig. 1. Block Diagram of Filtering Problem 

Causal Rate Distortion Theory and Estimation. In causal rate 
distortion theory one is given a distribution for the process X", 
which induces {Px-\xi~'-{dxi\x''~^) : i = 0,1,..., n}, and 
determines the causal reconstruction conditional distribution 
{PYi\Y'-^ ,xi{dyi\y^~^ , x"^) ■ i = 0,1, which mini- 
mizes the mutual information between X" and y" subject 
to distortion fidelity constraint, via a causal (realizability) 
constraint. The filter {Yi : i = 0, 1, . . . , n} of {Xi : i = 
0,1,..., n} is found by realizing the reconstruction distri- 
bution 2;') : j = 0,l,...,ri} via a 
cascade of sub-systems as shown in Fig. 2. 
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Fig. 2. Block Diagram of Filtering via Causal Rate Distortion Function 

The precise problem formulation necessitates the definitions 
of distortion function or fidelity, and mutual information. 
The distortion function or fidelity between x" and its recon- 
struction y", is a measurable function defined by 

n 

4=0 

The mutual information between X" and F", for a 
given distribution Px^{dx'^''), and conditional distribution 
Py„|X"(d?/"|a;"), is defined by El 



j log( 



Y"\X' 



P^„l^„(dy"|x")®Px"(dx") (LI) 



The realizability constraint is introduced next. Define the 
causal (n + 1)— fold convolution measure 



P 



A 



= (E)'^^oPY^\Y^-\x^idy^\y'~\x') - a.s. 

(1.2) 

The realizability constraint for a causal filter is defined by 



^ Y^ixAdy'^lx'^) ■■ 

PYr.\x4dy"\x")^^Y^\X-^ 



,(dy"|x")-a.s. (1.3) 



The realizability condition ( II.3b is necessary, otherwise the 
connection between filtering and realizable rate distortion 
theory cannot be established. This is due to the fact that 
PY,.ixAdy"\x") = ®'UPY,\Y^^^,x-[dy,\t-\x'') - a.s., 
and hence in general, for each i = 0,1,..., n, the 
conditional distribution of Yi depends on future symbols 
Xi_|_2, . . . , X„} in addition to the past and present 
symbols {Y^-^,X^). 

Causal RDF. The causal RDF is defined by 



A 



inf 



.(dy"|a;")e'^„d:E{<io,„(Jf",y")<Z?} 



(1.4) 



Note that realizability condition (II. 3b is different from the real 
izability condition in |8 
tion that Yi is independent of X 

i + l,i + 2,. 



which is defined under the assump- 
,,.^X,-k[x,\X'),j = 
,. The claim here is that realizability condition 



( II. 3b is more natural and applies to processes which are not 
necessarily Gaussian having square error distortion function. 
Realizability condition ( II. 3b is weaker that the causality con- 
dition found in 19 1 defined by X^_^-^ ^ X" o F". 
The point to be made regarding jll. 4b is that the realizability 
constraint PYr^\x"{dy"■\x"') ~ PY"\X"idy"\x") — a.s., is 
equivalent to the following (see also Lemma ITTb : 



P 



/(X";y") = J log 



Pv 



■y.|X'.(rfy"N")Px"(dx") = I(Px", (1.5) 

where I(Px'«, i^y^ix") indicates the functional dependence 
of on {Px",i^i'"|X"}- 

Therefore, by finding the solution of (II.4t . then one can realize 
it via a channel from which one can construct an optimal filter 
causally as in Fig. |2] 

This paper is organized as follows. Section |ll]discusses the for- 
mulation on abstract spaces. Section |lll] establishes existence 
of optimal minimizing distribution, and Section |IV] derives 
the non- stationary solution recursively. Section |V] describes 
the realization of causal RDF, while Section |VT] provides 
an example. Lengthy derivations are omitted due to space 
limitation. 

II. CAUSAL RDF ON ABSTRACT SPACES 

The source and reconstruction alphabets are sequences of 
Polish spaces fT3l as defined in the previous section. Proba- 



bility distributions on any measurable space {Z, B{Z)) are de- 
noted by It is assumed that the ci-algebras <t{X~^} — 
cr{y-i} = {0,17}. For {X,B{X)),{y,B{y)) measurable 
spaces, the set of conditional distributions Py\x{-\X ~ x) is 
denoted by Q{y; X) and it is equivalent to stochastic kernels. 
Mutual information is defined via the KuUback-Leibler dis- 
tance: 



A 



{Px^.yA\Px^ X Py„ 



log( 



I(Fx",Py..|X") 



0{Py^\x^{-\xn\\PY<-))Px^{dx^) 

(II. 1) 



Note that ( III. lb states that mutual information is expressed as 
a functional of {Px", ^V"|X"}- 

The next lemma (stated without prove) relates causal product 
conditional distributions and conditional independence. 
Lemma 2.1: The following are equivalent. 

1) PY^\x-[dv^\xn = " as- 

2) For each i ^ 0, - 1, Yi o {X\Y'-^) o 
(Xi+i, ■ • ■ , Xn) forms a Markov chain. 

3) For each i = 0, 1, . . . , n — 1, o X* o X^+i forms 
a Markov chain. 

According to Lemma 12.11 mutual information subject to 
causality reduces to 



/(X";y") 



log ( 



PvAdy" 



'^Y-\xAdy''\dx^)®PxAdx^)=l{Px-,tY^\x-) (II.2) 

where Py^idy"-) = J ^r"|X"(d2/"Ma;") ® Px"(da;"), 
and ( III.2I ) states that /(X";y") is a functional of 
{Px" , Py"\X"}- Hence, causal RDF is defined by optimizing 
I(Px" , Pyi|X") over Pyn|X" subject to the realizability 
constraint Pyn|x"(dy"|a:") = i^yn|x"(d?/"|x")— a.s., which 
satisfies a distortion constraint. 

Definition 2.2: (Causal Rate Distortion Function) Suppose 

da^n = Z^iLo Po,i(a;',2/*), where po,j : ^o,i x -> [0, oo), 
is a sequence of B{XQ,i) x i3(3^o,i)-measurable distortion 
functions, and let ^o,n(^) (assuming is non-empty) denotes 
the average distortion or fidelity constraint defined by 

'^O.n{P>) - {Py^\X^ e Q(yo,n;^O.n) : 

^do„(^y"|x.O^ / do.„(a;",y")Py.|X"(d2/"k") 



(11.3) 



where C^ad is the realizability constraint (II. 3b . The causal RDF 
is defined by 



Ro.niD) = 



inf 

Py"|x"e'5o,„(-D) 



I(Pxn,Py^|X") 



(11.4) 



Clearly, Pg„(_D) is characterized by minimizing mutual in- 
formation or equivalently I{Px" , Py^ix") over '^o,n(^)- 

III. EXISTENCE OF OPTIMAL CAUSAL 
RECONSTRUCTION 

In this section, the existence of the minimizing causal 
product kernel in (|II.4b is established by using the topology of 
weak convergence of probability measures on Polish spaces. 
Let BC{yo^n) denotes the set of bounded continuous real- 
valued functions on 3^o,ri- The assumptions required are the 
following. 

1) 3^0. n is a compact Polish space, Ao^„ is a PoUsh space; 

2) for all h{-)eBC{yo,n), the function (a;",y"-i) S 
Xo,n X yo,n-i H> Jy^ /i(y)Py |y,.- 1 (dy jy"" \ o;") e 
R is continuous jointly in the variables (a;",y"~^) G 

.n— 1 1 

3) do^nix", •) is continuous on 3^o,n; 

4) the distortion level D is such that there exist sequence 
(x",y") e A-Q,,, X 3^o,n satisfying do,„(a;",y") < I?. 

Note that since it is assumed that ^o.n is a compact Polish 
space, then Q(3^o,n; -^o,™) is weakly compact. 

Lemma 3.1: Assume that conditions 1), 2) hold. 
Then 

1) The realizability constraint set C^^j^; is a closed subset of 
a weakly compact set Q(3^o,ri; -^o.n) (hence compact). 

2) Under the additional conditions 3), 4) the set Qq„(_D) 
is a closed subset of ^^j^; (hence compact). 

The previous results follow from Prohorov's theorem that 
relates tighness and weak compactness. 

The next theorem establishes existence of the minimizing 
reconstruction kernel for (III. 4) ; it follows from Lemma |3T| and 
the lower semicontinuity of I(Px" , •) with respect to Pyn |X" . 



Theorem 3.2: Suppose the conditions of Lemma I3TTI hold. 
Then Pg has a minimum. 

IV. NON-STATIONARY OPTIMAL 
RECONSTRUCTION 

In this section the form of the optimal causal product recon- 
struction kernels is derived under non-stationarity assumption. 
The Gateaux differential of the (n + 1)— fold convolution 
product i^yn|x"(<^y"|a;") should be varied in each direction 

of PY\Yi-i,xi (c^yj|y'~\ a;*), j = 0, 1, . . . , n. 

Theorem 4.1: Suppose Ip^„ (Py. x* '■ * = 

0,1,. ..,n) ^ I(Px", i^y"|x'0 is well defined for every 
~^Y"\X" e Q(3^o,n; -^Cn) possibly taking values from 
the set [0,cx)]. Then {Py|Yi-i,x* • * = 0,1,..., n} 
Ipxn : « = 0, 1, • ■ • , ft) is Gateaux differentiable 

at every point in Q(3^o,n; -^o.ri), and the Gateaux 
derivative at the points Py i^i-i vi in each direction 
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distributions. Note also that for the stationary case all 
reconstruction conditional distributions are the same and 
hence, gn-k.n{-, •) = 0, fc = 0, 1, . . . , n. The above recursions 
are general, while depending on the application they can be 
simplified considerably. 

V. REALIZATION OF CAUSAL RDF 
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The realization of the causal RDF (optimal reconstruction 

kernel) is equivalent to identifying a communication channel, 

an encoder and a decoder such that the reconstruction from 

the sequence X" to the sequence Y" matches the causal rate 

distortion minimizing reconstruction kernel. Fig. |3] illustrates 

the cascade sub-systems that realize the causal RDF. This is 

called source-channel matching in information theory fl]. It 

is also described in [61 and |10|; this technique allows one 

to design encoding/decoding schemes without encoding and 

decoding delays. The realization of the optimal reconstruction 

kernel is given below. 

Definition 5.1: Given a source 

{Px,\x^-KY'-^{dxi\x^-\y'-^) 

channel {PB.\Bi-^,Ai{dbi\b'^~^ ,a'' 

_ ^ , r is a realization of the optimal reconstruction 

Note that Pyjyi^i X' G S(vKi; vKo,i-i X <Yo.i), therefore, one c ,u u i a 

,,. ■ , , • , • distribution if there exists a pre-channel encoder 



^^^Y^lX' 
Pyo\XO^-Pyi\YO,X^ ' 
'8>}^0 PYj\Y3-^,X3 

The constrained problem defined by (III.4) can be reformu- 
lated using Lagrange multipliers as follows (equivalence of 
constrained and unconstrained problems follows from lfl4l ). 



inf 



Y"\X'^ 



and s € (—00,0] is the Lagrange multiplier 



(IV. 1) 



i = 0, . . . , n}, a 
: i = 0, . . . , n} 



should introduce another set of Lagrange multipliers to obtain 



an optimization problem without constraints. This process is , ' , ' , , ■ ' r 7-> 
, , , , . , and a post-channel decoder \P^ 



involved, hence we state the main results. 
General Recursions for Non-Stationary Optimal Reconstruction 
For fc = 0, . . . , n 



A 



0, . . . , n} such that 

K"|X"(rf2/"k") = 



Yi\Yi 
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□ ^PO,Jl-fc+l— 57l-fc+l,Tl p 



where the joint distribution is 



the optimal reconstruction is given by 
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The causal RDF is given by 
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Fig. 3. Block Diagram of Realizable Causal Rate Distortion Function 



The filter is given by {Pxi\B'-^{dxi\b'' ^) : i — 0,...,n}. 



Thus, if {PB^lB^-l.A'idb^\b'-^,a') : i = 0, . . . , n} is a 
The above recursions illustrate the causality, since realization of the causal RDF minimizing distribution then 
5n-fc,n(2;""'^, y"^'^) appearing in the exponent of the the channel connecting the source, encoder, channel, decoder 
reconstruction distribution integrate out future reconstruction achieves the causal RDF, and the filter is obtained. 



VI. EXAMPLE: BINARY MARKOV SOURCE 

Consider a binary Markov source, while the objective is to 
detect consecutive sequences of {l}'s subject to a specific, 
pre-defined distortion or error criterion. The Markov source 
has the following transition probability matrix. 

P{x, = 0|x,_i = 0) = 1 - p, P{x, = = 0) = p 

P{xi = 0; = 1) = q, P{x^ = = 1) = I - q 

The steady state joint probabilities P{xi, Xi^i) are given by 



P{xi = 0,Xi^i 
P{xi = 0,Xj_i 



i-1 



0) 

1) 
1) 



p + q 
pq 



p + q 

p{i-q) 



P{x, - =0) 



p + q 

The distortion function is described in Table I. 
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TABLE I 
Distortion: d{xi,xi-i,yi) 

For the given distortion measure the optimal reconstruction 
kernel has the following form 

^sdix,.x,-i,yi)p*(^y^^ 



P*{yi\xi,Xi-i) 



in which P* {yi\yi^i) = P*{yi). The Lagrange parameter s is 
the slope of the causal RDF. Then 

P*(1|0,0) = P*(1|0, 1) = P*(1|1,0) = 1 - a 

P*(0|0,0) = P*(0|0,1) = P*(0|1,0) = a 

P*(0|1,1) = 1 -P*(l|l,l) = 1- /3 

P*(zA = 0) = 1-P*(y, = 1)=7 

13 = 



where a = i^-D)(,-Dp-Dg+p,) 



7 



^ {l-D){Dp-p+Dq+pq) 

q(l-2D){l+p) ' p(l-2D)(l+q) 

= V^'^nu''^^ - The causal RDF is 



R%D) 
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min P{dxi,dxi-i)d{xi,Xi-i,yi) 
Vi ^-^ 

Xi.Xi-i 

q{l+p) pil-q)' 



im I - 



p+q p+q 
The filter which realizes the optimal reproduction kernel 
P*( |-, ) via the specification of an encoder, channel and 
decoder which achieves the causal RDF, P'^(_D), is described 
in Q. 

Special Case. Consider a special case when = i. Then 



1 - H{D) if < i 
if Z? > I 




Fig. 4. _R'=(D) for p=0.55 and q=0.45 



Note that the capacity of a binary symmetric channel with 
error probability e = D < ^ is precisely C(e) = 1 — H{D) 
lIJ). Therefore, the realization of the reproduction kernel is 
given by the cascade of encoder, the binary symmetric channel, 
and decoder such that the directed information including the 
encoder but not the decoder operates at the capacity C(e) = 
1 — H{D), and it is equal to the directed information from the 
source to the decoder output. UtiUzing the capacity achieving 
encoder and decoder for the binary symmetric channel found 
by Horstein in ifTSl . the realization is completed. 
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